Git and GitHub Tutorial

Overview of Git and GitHub

At a high-level, what are git and GitHub?

  • git: a version control system that allows you to track changes in your code
  • GitHub: a platform that allows you to host your git repositories online/remotely

There many possible starting points for creating/initializing a GitHub repository:

  1. Start with an existing remote repository from GitHub;
  2. Create a new remote repository on GitHub; or
  3. Start with an existing local repository on your computer.

In this walkthrough, we will be setting up two GitHub repositories:

  1. dsip-s26: repository with course materials (lectures, code, etc.)
    • To set up this dsip-s26 repository, we will use option (A) above.
    • You won’t be interacting with this repository much besides pulling to receive course materials.
  2. dsip: your repository for your own work (e.g., labs, final project)
    • To set up this dsip repository, we will use option (B) above.
    • This is the repository that you will be interacting with the most.

Instructions to set up the dsip-s26 repository

In your terminal:

  1. Navigate to the directory where you want to store the course materials, e.g.,
cd path/to/directory
  1. Clone the dsip-s26 repository by running the following command:
git clone https://github.com/tiffanymtang/dsip-s26.git

Note: This will create a new directory called dsip-s26 in your current working directory. To see this, you can run ls

  1. To update the course materials at any point during the semester, you should navigate into the dsip-s26 directory, e.g.,
cd dsip-s26

and run

git pull
  1. Open GitKraken and click on the “Clone a repo” button.

  2. In the URL field, enter the following URL: https://github.com/tiffanymtang/dsip-s26. You can select where you want to store this repository on your computer by clicking on the “Browse” button next to “Where to clone to”. Once you are satisfied with the location, click on the “Clone the repo!” button.

  3. If a pop-up appears asking you whether to open the dsip-s26 repository, go ahead and click on the “Open Now” button.

  4. To update the course materials at any point during the semester, click on the “Pull” button at the top of the application.

Instructions to set up your dsip repository

Next, we will create your personal dsip repository that you will be using to work on your labs. Unlike the dsip-s26 repository which was already an existing GitHub repository (and thus you only had to clone it locally), you will be creating your dsip repository from scratch on GitHub.

  1. Go to: https://github.com/ and log in.

  2. Click on the green “New” button (on the left) to create a new repository.

  3. Fill in the following information:

    • Owner: your GitHub username
    • Repository name: dsip
    • Public or Private: Please choose “Private” so that only you (and your added collaborators) can see your repository.
    • Initialize this repository with: I would recommend checking the box for “Add a README file” so that you can easily clone the repository to your computer.
    • Add .gitignore: For now, you can leave this as “None”.
    • Add a license: I would recommend selecting “MIT License” from the dropdown menu, but this is optional.

  4. Click on the green “Create repository” button.

  5. Once you have created the repository, you will be taken to the repository’s main page. We next need to “clone” the (remote) repository to our local computers like we did with the dsip-s26 repository. So following the same steps from before:

In your terminal:

  1. Navigate to the directory where you want to store your dsip repository, e.g.,
cd path/to/directory
  1. Clone the dsip repository by running the following command:
git clone https://github.com/{your_github_username}/dsip.git

Note: This will create a new directory called dsip in your current working directory. To see this, you can run ls

  1. Open GitKraken and click on the “Clone a repo” button.

  2. In the URL field, enter the following URL: https://github.com/{your_github_username}/dsip. You can select where you want to store this repository on your computer by clicking on the “Browse” button next to “Where to clone to”. Once you are satisfied with the location, click on the “Clone the repo!” button.

  3. If a pop-up appears asking you whether to open the dsip repository, go ahead and click on the “Open Now” button.

So far, we’ve set up two different GitHub repositories. Next, using your dsip repository, we will go over how to interact/make changes to these repositories and to push these changes to GitHub.

A typical GitHub workflow

A typical GitHub workflow involves the following four commands:

  1. First, git pull to download changes from the remote GitHub repository to your local computer
  2. After making changes to your local repository, git add files that you’d like to stage for your next commit
  3. Next, git commit to store a “snapshot” of these added changes in your git version history
  4. Finally, git push to upload these local changes to the remote GitHub repository

To see this workflow in action, let’s make a minor change to our dsip repository. In particular, let’s create a new text file called info.txt that contains the following two lines:

name = "Your Name"
github_name = "Your GitHub Username"

Please place this info.txt file in your dsip folder (i.e., the file path should be dsip/info.txt).

Let’s now go through the four steps of the GitHub workflow. We will look at the equivalent commands using terminal, GitHub Desktop, and GitKraken side-by-side.

Terminal

  1. Navigate to the desired repository (i.e., your dsip repository):
cd path/to/dsip

GitKraken

  1. Navigate to the desired repository (i.e., your dsip repository):

    Open your dsip repository in GitKraken (e.g., using the “Browse for a repo” button).


  1. To pull:
git pull

  1. To pull:

    Click on the “Pull” button at the top of the application.

Recall: “pulling” is the process of downloading changes from the remote GitHub repository to your local computer.


  1. To add modified/new files to staging area:
git add info.txt

You may want to check the status of your git repository using git status to see which files have been modified and/or added to the staging area. It is common to run git status before and/or after each step of this workflow when first learning git.


  1. To add modified/new files to staging area:

    Click on the “Stage File” button next to the file(s) that you want to add to the staging area.

    Once you click on “Stage File”, this will move the file(s) from the “Unstaged Files” section to the “Staged Files” section.


  1. To commit staged files (with message/description):
git commit -m "add info.txt"

  1. To commit staged files (with message/description):

    Add a commit message to the “Commit summary” field. Once you are satisfied with the message, click on the “Commit changes” button.

Tip: It is good practice to keep your commits modular and focused (e.g., they should address one bug or add one feature to your code). This will make it easier to track version changes and to revert back to previous versions if needed. To help facilitate this, you should also try to write informative commit messages that describe the changes you made in the commit.


  1. To push:
git push

  1. To push:

    Click on the “Push” button at the top of the application. After you click on “Push”, the head of the local repository (computer icon) and the head of the remote repository (your GitHub icon) should be aligned at the same commit.

Recall: “pushing” is the process of uploading changes from your local computer to the remote GitHub repository. If you do not push your changes, they will not be reflected on GitHub and not accessible to collaborators.


Lastly, please add tiffanymtang and caiyufei8 as a collaborator in your dsip repository so that I and the grader can view your lab submissions. To do this, please:

  1. Go to your dsip repository on GitHub: https://github.com/{your_github_username}/dsip
  2. Go to Settings (on the top) > Collaborators (on the left) > Add people (the green button) > Enter tiffanymtang > Click on “Add tiffanymtang to this repository”.
  3. Repeat the same process to add caiyufei8 as a collaborator.

.gitignore

As you begin working on your labs and final project, you will likely generate some files that you do not want to track with git (e.g., data files, temporary files, compiled files, etc.). For example, the .DS_Store file is a hidden “junk” file that is created by macOS and should not be tracked. Python also generates __pycache__ folders when compiling code, and Jupyter notebooks generate .ipynb_checkpoints folders when running notebooks. These files/folders are not necessary to track and will just clutter your repository.

We can instruct git to ignore these files by creating a .gitignore file in our repository. This file contains a list of files and directories that we want git to ignore and never track.

If you followed the R parts of this walkthrough, then a .gitignore file has already been created automatically (by renv). To find this file in your file manager, you will need to show hidden files (i.e., any files that start with .). To reveal hidden files in your file manager, you can press Ctrl+Shift+. (or Cmd+Shift+. on Mac). If a .gitignore has not yet been created, you can create one manually by opening your favorite text editor and saving an empty file with the name .gitignore.

To add the .DS_Store file to the .gitignore file, you can open the .gitignore file in your text editor and add the following line:

*.DS_Store

Note: the * is a wildcard character that matches any sequence of characters. So *.DS_Store will match any file that ends with .DS_Store, and thus, adding the above line to your .gitignore will tell git to ignore all files that end in the extension .DS_Store.

Some other files/folders that you should add to your .gitignore file include:

*/data/*
*__pycache__*
*.ipynb_checkpoints*

It is generally best practice to avoid pushing large data files to GitHub repositories; hence, here we are ignoring all files in any data/ folder. Avoid uploading the datasets to GitHub for your labs!

For reference, GitHub has a file size limit of 100 MB per file. Large files close to this limit can dramatically slow down the performance of your repository. If you exceed this limit, bad things usually happen (e.g., losing lots of work, being unable to push new changes, etc).

After these changes, your .gitignore file should look something like this:

Please save these changes to your .gitignore file. After saving these changes, you can check the status of your repository again to see that many of the files that you previously saw (e.g., .DS_Store, the data files, …) are no longer being tracked by git.

Take one last moment to review all of the files remaining in your git status (or GitHub Desktop/GitKraken status view) are files that you’d like to commit and push to your GitHub repository. If you are satisfied with the files that you see, you can now proceed through the usual GitHub workflow of pulling, adding, committing, and pushing your changes to your GitHub repository.

Git Cheat Sheet

For a quick reference guide to common git/GitHub commands, please refer to this GitHub Cheat Sheet.